concept similarity
Representational Similarity via Interpretable Visual Concepts
Kondapaneni, Neehar, Mac Aodha, Oisin, Perona, Pietro
How do two deep neural networks differ in how they arrive at a decision? Measuring the similarity of deep networks has been a long-standing open question. Most existing methods provide a single number to measure the similarity of two networks at a given layer, but give no insight into what makes them similar or dissimilar. We introduce an interpretable representational similarity method (RSVC) to compare two networks. We use RSVC to discover shared and unique visual concepts between two models. We show that some aspects of model differences can be attributed to unique concepts discovered by one model that are not well represented in the other. Finally, we conduct extensive evaluation across different vision model architectures and training protocols to demonstrate its effectiveness.
Large Language Models Relearn Removed Concepts
Lo, Michelle, Cohen, Shay B., Barez, Fazl
Advances in model editing through neuron pruning hold promise for removing undesirable concepts from large language models. However, it remains unclear whether models have the capacity to reacquire pruned concepts after editing. To investigate this, we evaluate concept relearning in models by tracking concept saliency and similarity in pruned neurons during retraining. Our findings reveal that models can quickly regain performance post-pruning by relocating advanced concepts to earlier layers and reallocating pruned concepts to primed neurons with similar semantics. This demonstrates that models exhibit polysemantic capacities and can blend old and new concepts in individual neurons. While neuron pruning provides interpretability into model concepts, our results highlight the challenges of permanent concept removal for improved model \textit{safety}. Monitoring concept reemergence and developing techniques to mitigate relearning of unsafe concepts will be important directions for more robust model editing. Overall, our work strongly demonstrates the resilience and fluidity of concept representations in LLMs post concept removal.
Answering Instance Queries Relaxed by Concept Similarity
Ecke, Andreas (TU Dresden) | Peรฑaloza, Rafael (TU Dresden) | Turhan, Anni-Yasmin (TU Dresden)
In Description Logic (DL) knowledge bases (KBs) information is typically captured by crisp concepts. For many applications, querying the KB by crisp query concepts is too restrictive. A controlled way of gradually relaxing a query concept can be achieved by the use of concept similarity measures. In this paper we formalize the task of instance query answering for crisp DL KBs using concepts relaxed by concept similarity measures. We investigate computation algorithms for this task in the DL EL, their complexity and properties for the employed similarity measure regarding whether unfoldable or general TBoxes are used.
Evaluating Semantic Metrics on Tasks of Concept Similarity
Schwartz, Hansen Andrew (University of Central Florida) | Gomez, Fernando (University of Central Florida)
This study presents an evaluation of WordNet-based semantic similarity and relatedness measures in tasks focused on concept similarity. Assuming similarity as distinct from relatedness, the goal is to fill a gap within the current body of work in the evaluation of similarity and relatedness measures. Past studies have either focused entirely on relatedness or only evaluated judgments over words rather than concepts. In this study, first, concept similarity measures are evaluated over human judgments by using existing sets of word similarity pairs that we annotated with word senses. Next, an application-oriented study is presented by integrating similarity and relatedness measures into an algorithm which relies on concept similarity. Interestingly, the results find metrics categorized as measuring relatedness to be strongest in correlation with human judgments of concept similarity, though the difference in correlation is small. On the other hand, an information content metric, categorized as measuring similarity, is notably strongest according to the application-oriented evaluation.